Wildfires in 2023

Spatial Clustering using Satellite Imagery

Bogdan G. Popescu

John Cabot University

Intro

  • In 2023, the world experienced a significant increases in the frequency of wildfires

  • Wildfires pose challenges to ecosystem, communities and firefighting efforts

Objectives

This exercise aims to map all wildfires that happened in 2023 on a month-by-month basis

It also tries to perform statistical analyses to indentify statistical clusters.

Data

  • The data is based off of The Terra and Aqua combined MCD64A1 Version 6.1 Burned Area data product

  • This is a monthly, global gridded 500m product containing per-pixel burned-area and quality information

Maps for the Whole World: January 2023

Maps for the Whole World: July 2023

Maps for the Whole World: August 2023

Maps for the Whole World: Sep. 2023

Maps for Europe: June 2023

Maps for Europe: July 2023

Maps for Europe: August 2023

Maps for Europe: September 2023

Analyzing Spatial Clusters

To analyze cluster, I use Getis-Ord Gi statistics.

The Getis-Ord Gi statistics produces z-scores, and p-values, showing where features with either high or low values cluster spatially.

The Getis-Ord Gi statistic

\[ G_i^* = \frac{\sum_{j=1}^{n} w_{ij}x_j - \bar{x} \sum_{j=1}^{n} w_{ij}}{\sqrt{\left(\sum_{j=1}^{n} w_{ij}^2\right) \left(\frac{\sum_{j=1}^{n} x_j^2}{n} - \bar{x}^2\right)}} \] where:

  • \(G_i^x\) - the Gi* statistic for location i
  • n is the total number of locations
  • \(x_j\) is the attribute value at location j
  • \(\bar{x}\) is the mean of the attribute values
  • \(w_{ij}\) is the spatial weight between locations i and j

The Global G Test: Data in July


    Getis-Ord global G statistic

data:  tes_subset$burn 
weights: tes_w_binary 

standard deviate = -2.1995, p-value = 0.9861
alternative hypothesis: greater
sample estimates:
Global G statistic        Expectation           Variance 
      9.031613e-04       9.054901e-04       1.121059e-12 

The output shows a standard deviate of -2.199,

  • the observed clustering of wildfires is -2.199 standard deviations away from what would be expected under the null hypothesis of no clustering.

  • This value is associated with a p-value of 0.9861, so observed clustering is not statistically significant at the 0.05 level.

The Global G Test: Data in August


    Getis-Ord global G statistic

data:  tes_subset$burn 
weights: tes_w_binary 

standard deviate = 4.1566, p-value = 1.615e-05
alternative hypothesis: greater
sample estimates:
Global G statistic        Expectation           Variance 
      8.942119e-04       8.907281e-04       7.024831e-13 

The output shows a standard deviate of 4.1566,

  • The observed clustering of wildfires is 4.1566 standard deviations away from what would be expected under the null hypothesis of no clustering.

  • This value is associated with a p-value of 1.615e-05, so observed clustering is statistically significant at the 0.05 level.

The Hotspots in August

Conclusion

  • Spatial statistics can help identify areas where wildfires are clustered (hotspots)

  • This information is crucial for policymakers to prioritize resources and interventions.

  • The spatial distribution of wildfires using Gi* statistics can help inform policy makers where to allocate resources more effectively

Technical Appendix

  • The full code to replicate this study is in the appendix of the memo.

Part 1: Reading the Data

library(sf)
library(stars)
library(dplyr)
library(sfdep)
library(tidyr)
library(ggplot2)
#Step1: Turning spherical geomtry off
sf::sf_use_s2(FALSE)
#Step2: Reading world borders
borders <- st_read(dsn="/Users/bgpopescu/Library/CloudStorage/Dropbox/john_cabot/teaching/big_data/week9/data/boundaries/all-country-boundaries.shp", quiet = TRUE)
#Step3: Simplifying borders
borders2<-st_simplify(borders, dTolerance = 0.1)
#Step4: Reading rasters with wildfires
left_hem<-read_stars("/Users/bgpopescu/Library/CloudStorage/Dropbox/john_cabot/teaching/big_data/example/data/july/july-0000000000-0000000000.tif")
right_hem<-read_stars("/Users/bgpopescu/Library/CloudStorage/Dropbox/john_cabot/teaching/big_data/example/data/july/july-0000000000-0000023296.tif")
#Step5: Mosaicking the wildfire rasters
july<-st_mosaic(left_hem, right_hem)

#Step4: Reading rasters with wildfires
left_hem<-read_stars("/Users/bgpopescu/Library/CloudStorage/Dropbox/john_cabot/teaching/big_data/example/data/august/august-0000000000-0000000000.tif")
right_hem<-read_stars("/Users/bgpopescu/Library/CloudStorage/Dropbox/john_cabot/teaching/big_data/example/data/august/august-0000000000-0000023296.tif")
#Step5: Mosaicking the wildfire rasters
august<-st_mosaic(left_hem, right_hem)


#Step6: Reducing resolution. Higher dx/day values values mean lower resolution
burn2_july<-st_warp(july, st_as_stars(st_bbox(borders), dx = 0.1, dy = 0.1))
#Step7: Turning the rasters into points
burn3_july<-st_as_sf(burn2_july, as_points = FALSE, merge = FALSE)
names(burn3_july)[1]<-"burn"


#Step6: Reducing resolution. Higher dx/day values values mean lower resolution
burn2_august<-st_warp(august, st_as_stars(st_bbox(borders), dx = 0.1, dy = 0.1))
#Step7: Turning the rasters into points
burn3_august<-st_as_sf(burn2_august, as_points = FALSE, merge = FALSE)
names(burn3_august)[1]<-"burn"

Part 2: Doing the neighbor analysis: Data July

library(spdep)
#Step1: Create a neighbor list based on queen contiguity
list_nb <- spdep::poly2nb(burn3_july, queen = TRUE)
#Step2: Checking for empty neighbor sets (polygons without neighbors).
empty_nb <- which(card(list_nb) == 0)
#Step3: Remove polygons with empty neighbor sets from the data
tes_subset <- burn3_july[-empty_nb, ]
#Step4: Identify neighbors with queen contiguity (edge/vertex touching)
tes_nb <- poly2nb(tes_subset, queen = TRUE)
#Step5: Binary weighting assigns a weight of 1 to all neighboring features 
# and a weight of 0 to all other features
tes_w_binary <- nb2listw(tes_nb, style="B")
#Step6:  Calculate spatial lag for burn
tes_lag <- lag.listw(tes_w_binary, tes_subset$burn)
#Step7: Test for global G statistic of burned areas
globalG.test(tes_subset$burn, tes_w_binary)

    Getis-Ord global G statistic

data:  tes_subset$burn 
weights: tes_w_binary 

standard deviate = -2.1995, p-value = 0.9861
alternative hypothesis: greater
sample estimates:
Global G statistic        Expectation           Variance 
      9.031613e-04       9.054901e-04       1.121059e-12 

Part 2: Doing the neighbor analysis: Data August

library(spdep)
#Step1: Create a neighbor list based on queen contiguity
list_nb <- spdep::poly2nb(burn3_august, queen = TRUE)
#Step2: Checking for empty neighbor sets (polygons without neighbors).
empty_nb <- which(card(list_nb) == 0)
#Step3: Remove polygons with empty neighbor sets from the data
tes_subset <- burn3_august[-empty_nb, ]
#Step4: Identify neighbors with queen contiguity (edge/vertex touching)
tes_nb <- poly2nb(tes_subset, queen = TRUE)
#Step5: Binary weighting assigns a weight of 1 to all neighboring features 
# and a weight of 0 to all other features
tes_w_binary <- nb2listw(tes_nb, style="B")
#Step6:  Calculate spatial lag of TreEqty
tes_lag <- lag.listw(tes_w_binary, tes_subset$burn)
#Step7: Test for global G statistic of burned areas
globalG.test(tes_subset$burn, tes_w_binary)

    Getis-Ord global G statistic

data:  tes_subset$burn 
weights: tes_w_binary 

standard deviate = 4.1566, p-value = 1.615e-05
alternative hypothesis: greater
sample estimates:
Global G statistic        Expectation           Variance 
      8.942119e-04       8.907281e-04       7.024831e-13 

Calculating the Local GI

We can calculate the Local GI to get a sense of the local hotspots in wildfires

# Identify neighbors, create weights, calculate spatial lag
tes_nbs <- tes_subset%>% 
  mutate(
    nb = st_contiguity(geometry),        # neighbors share border/vertex
    wt = st_weights(nb),                 # row-standardized weights
    tes_lag = st_lag(burn, nb, wt)    # calculate spatial lag of TreEqty
  ) 

Calculating the Local GI

# Calculate the Gi using local_g_perm
tes_hot_spots <- tes_nbs %>%
  mutate(
    Gi = local_g_perm(burn, nb, wt, nsim = 999)
    # nsim = number of Monte Carlo simulations (999 is default)
  ) |> 
  # The new 'Gi' column itself contains a dataframe 
  # We can't work with that, so we need to 'unnest' it
  unnest(Gi)

Let’s do a cursory visualization of Gi values

Calculating the Local GI

We can zoom in even further

Final Hotspot in August

tes_hot_spots2<-subset(tes_hot_spots, select = c(gi, p_folded_sim))
tes_hot_spots3<-tes_hot_spots2%>%
  mutate(
    # Add a new column called "classification"
    classification = case_when(
      # Classify based on the following criteria:
      gi > 0 & p_folded_sim <= 0.01 ~ "Very hot",
      gi > 0 & p_folded_sim <= 0.05 ~ "Hot",
      gi > 0 & p_folded_sim <= 0.1 ~ "Somewhat hot",
      gi < 0 & p_folded_sim <= 0.01 ~ "Very cold",
      gi < 0 & p_folded_sim <= 0.05 ~ "Cold",
      gi < 0 & p_folded_sim <= 0.1 ~ "Somewhat cold",
      TRUE ~ "Insignificant"
    ),
    # Convert 'classification' into a factor for easier plotting
    classification = factor(
      classification,
      levels = c("Very hot", "Hot", "Somewhat hot",
                 "Insignificant",
                 "Somewhat cold", "Cold", "Very cold")
    )
  )

ggplot() +
  geom_sf(data=tes_hot_spots3, color=NA, aes(fill = classification)) +
  scale_fill_brewer(type = "div", palette = 5) +
    geom_sf(data=borders, fill=NA)+
  coord_sf(xlim = c(min_lon, max_lon), ylim = c(-15, -5), expand = FALSE)

Final Hotspot in August